imputation in missing not at random snps data using em algorithm

نویسندگان

mahmood alipour heidari department of biostatistics, faculty of medical sciences, tarbiat modares university,tehran

hamid alavi majd department of biostatistics, faculty of paramedical sciences, shahid beheshti university of medical sciences,tehran

ebrahim hajizadeh department of biostatistics, faculty of medical sciences, tarbiat modares university,tehran

kamal azam department of epidemiology and biostatistics, school of public health, tehran university of medical sciences,tehran

چکیده

the relation between single nucleotide polymorphisms (snps) and some diseases has been concerned by many researchers. also the missing snps are quite common in genetic association studies. hence, this article investigates the relation between existing snps in dnmt1 of human chromosome 19 with colorectal cancer. this article aims is to presents an imputation method for missing snps not at random. in this case-control study, 100 patients suffering from colorectal cancer consulting with the research institute for gastroenterology and liver disease of shahid beheshti university of medical sciences were considered as the case group and 100 other patients consulting with the same research institute were considered as the control group and the genetic test was applied in order to identify the genotype of the 6 snps of the dnmt1 of chromosom 19 for all the patients under investigation. the obtained data were analyzed using logistic regression, then a fraction of the data was eliminated both at random and not at random and the imputation was done through the em algorithm and the logistic regression coefficients variation before and after the imputation was compared. the results of this study implied that in both methods, at random and not at random missing snps, the estimation of the logistic regression coefficients after the imputation through em algorithm has a greater correspondence to the results obtained from the complete data in comparison with the method of eliminating the missing values.

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

Missing value imputation on missing completely at random data using multilayer perceptrons

Data mining is based on data files which usually contain errors in the form of missing values. This paper focuses on a methodological framework for the development of an automated data imputation model based on artificial neural networks. Fifteen real and simulated data sets are exposed to a perturbation experiment, based on the random generation of missing values. These data set sizes range fr...

متن کامل

Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data

OBJECTIVE QoL data were routinely collected in a randomised controlled trial (RCT), which employed a reminder system, retrieving about 50% of data originally missing. The objective was to use this unique feature to evaluate possible missingness mechanisms and to assess the accuracy of simple imputation methods. METHODS Those patients responding after reminder were regarded as providing missin...

متن کامل

Imputation methods for quantile estimation under missing at random

Imputation is frequently used to handle missing data for which multiple imputation is a popular technique. We propose a fractional hot deck imputation which produces a valid variance estimator for quantiles. In the proposed method, the imputed values are chosen from the set of respondents and are assigned with proper fractional weights that use a density function for the working model. In addit...

متن کامل

Imputation of Missing Values for Unsupervised Data Using the Proximity in Random Forests

This paper presents a new procedure that imputes missing values by random forests for unsupervised data. We found that it works pretty well compared with k-nearest neighbor (kNN) and rough imputations replacing the median of the variables. Moreover, this procedure can be expanded to semisupervised data sets. The rate of the correct classification is higher than that of other conventional method...

متن کامل

Multiple Imputation for Missing Data

Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard proc...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

عنوان ژورنال:

journal of paramedical sciences

جلد ۲، شماره ۳، صفحات ۰-۰

کلمات کلیدی

em algorithm single nucleotide polymorphisms (snps) colorectal cancer dnmt1 human’s 19th chromosome logistic regression missing value

میزبانی شده توسط پلتفرم ابری doprax.com